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1  Introduction 

This  document  is  the  final  technical  report  for  Phase  II  of  the  DARPA  Computer  Science  Study  Group  (CSSG)  program 
started  by  the  PI  in  the  year  2008.  (Phase  II  itself  began  for  the  PI  in  2009.)  It  follows  the  reporting  requirements 
specified  in  the  award  document  for  the  project. 


2  A  comparison  of  actual  accomplishments  with  the  goals  and  objectives 
established  for  the  grant,  the  findings  of  the  investigator,  or  both. 

In  Phase  I  of  the  DARPA  CSSG  we  developed  an  early  version  of  a  new  algorithm  for  training  multiple  robotic 
agents  to  coordinate  with  each  other  called  multiagent  HyperNEAT  (D’Ambrosio  and  Stanley  2008).  This  approach 
built  upon  Hypercube-based  NeuroEvolution  of  Augmenting  Topologies  (HyperNEAT),  a  new  algorithm  for  evolving 
artificial  neural  networks  that  we  had  introduced  shortly  before  (D’Ambrosio  and  Stanley  2007;  Gauci  and  Stanley 
2007,  2010;  Stanley  et  al.  2009).  The  HyperNEAT  algorithm  has  the  interesting  property  that  it  can  generate  the 
weights  of  neural  connections  based  on  the  locations  of  the  nodes  they  connect,  which  means  that  in  effect  it  generates 
connectivity  based  on  geometry.  This  capability  led  to  they  key  insight  behind  Multiagent  HyperNEAT  that  it  may 
be  possible  to  generate  a  set  of  neural  networks  based  on  each  network’s  position  on  a  virtual  field.  For  example, 
the  positions  are  like  the  positions  of  soccer  players  on  a  soccer  team,  with  forwards  exhibiting  offensive  tactics  and 
fullbacks  more  defensive.  In  a  similar  way,  HyperNEAT  could  be  used  to  generate  a  whole  team  of  neural  networks  that 
share  skills  yet  also  exhibit  role-specific  behaviors.  Such  controllers,  trained  (i.e.  not  programmed  directly)  through 
evolution,  could  in  principle  be  deployed  in  real  robots  and  UGVs  to  perform  tasks  such  as  room  clearing  or  building 
patrol  autonomously.  The  aim  of  Phase  II  was  to  turn  this  idea  into  reality. 

The  goals  and  objectives  in  the  original  Phase  II  proposal  were  organized  into  four  milestones  over  two  years. 
These  milestones  are  reviewed  next  in  sequence. 
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(a)  5-Agent  r(x)  Substrate  (b)  7-Agent  r(x)  Substrate 


(c)  z-axis  CPPN 


(d)  3-Agent  2  Substrate 


(e)  5-Agent  2  Substrate 


Figure  1 :  Old  and  New  Multiagent  HyperNEAT  Geometries.  The  original  formulation  (called  r(.-r) )  shown  in  (a)  placed  ANNs 
on  the  substrate  side-by  side.  The  problem  with  this  approach  is  that  when  new  networks  are  added  to  scale  the  size  of  the  team 
up  (b),  the  new  arrangements  overlap  with  the  old  divisions.  (Notice  that  the  agent  ANNs  in  (bj  cross  the  original  dividing  lines 
between  agents.)  By  introducing  a  z-axis  in  the  new  formulation  (c),  it  becomes  possible  to  scale  from  e.g.  three  agents  (d)  to  five 
agents  (e)  without  any  overlap.  In  this  way,  no  matter  how  many  agents  are  introduced,  they  will  never  overlap  along  the  2-axis. 


2.1  Milestone  1:  Revising  Policy  Geometry  Encoding 

The  first  milestone  was  to  revise  the  original  policy  geometry  encoding  based  on  an  improved  geometric  conception  of 
how  the  artificial  neural  networks  (ANNs)  should  be  arranged  in  space  (which  is  called  the  substrate  in  HyperNEAT). 
Figure  1  gives  a  sense  of  how  the  geometry  was  revised. 

This  reorganization,  called  the  revised  policy  geometry  encoding,  was  completed  and  successful  in  a  variety  of 
tasks.  It  became  the  standard  encoding  that  we  continued  to  adopt  throughout  Phase  II  and  to  this  day  as  we  enter  Phase 
III.  While  the  original  encoding  was  published  in  GECCO-2008  (and  won  a  Best  Paper  Award  there)  (D’Ambrosio 
and  Stanley  2008),  the  new  encoding  was  introduced  at  AAMAS-2010  (D’Ambrosio  et  al.  2010),  a  top-tier  confer¬ 
ence  on  autonomous  agents  and  multiagent  systems.  Both  publications  demonstrated  the  approach  in  a  coordinated 
predator-prey  task  but  the  AAMAS  publication  also  added  room  clearing,  taking  a  step  towards  real-world  DoD-related 
applications.  In  this  way,  Milestone  1  was  fully  satisfied. 


2.2  Milestone  2:  Key  Extensions 

This  second  part  of  the  project  focused  on  extensions  to  the  multiagent  HyperNEAT  approach.  The  first  proposed 
extension,  seeding,  meant  starting  multiagent  evolution  from  a  pre-trained  single  individual.  The  idea  is  that  the  pre¬ 
trained  seed  would  have  specific  skills  (e.g.  chasing  a  prey  in  predator-prey)  from  which  the  entire  team  might  benefit. 
This  idea  was  successfully  demonstrated  in  various  predator-prey  variants,  first  at  GECCO-2008  (D’Ambrosio  and 
Stanley  2008)  and  later  in  a  more  sophisticated  version  of  the  task  that  was  compared  to  the  SARSA  reinforcement 
learning  algorithm.  This  later  demonstration  is  in  submission  at  present  at  the  Journal  of  AI  Research  (JAIR).  The 
main  result  is  that  it  is  easier  to  train  multiagent  teams  from  a  pre-trained  seed  ANN  than  from  scratch,  confirming  the 
hypothesis  behind  seeding. 

The  second  proposed  extension  was  called  multi-dimensional  policy  geometry  in  the  original  proposal  but  was 
changed  it  to  situational  policy  geometry  later,  which  is  a  similar  idea  but  with  more  specific  meaning.  The  main 
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idea  is  that  not  only  are  there  ANNs  for  different  agents,  but  also  for  different  situations  in  which  the  agents  might 
find  themselves.  This  conceptual  geometry  in  effect  expands  the  dimensions  in  the  original  policy  geometry  to  in¬ 
clude  not  just  different  positions  on  a  team  but  different  situations  for  the  same  position.  The  result  is  that  the 
same  agent  in  effect  possesses  several  “brains,”  one  for  each  situation  it  might  need  to  confront.  This  idea  is  pub¬ 
lished  in  IROS-2011,  a  major  conference  on  robotics  (D’Ambrosio  et  al.  2011).  The  publication  includes  a  real- 
world  demonstration  with  Khepera  III  robots  (as  was  proposed).  Videos  of  this  demonstration  with  real  robots  are  at: 
http : // eplex .cs.ucf . edu/pat rolling . html 

The  final  proposed  extension  was  called  a  hive  brain.  The  idea  was  to  connect  neurons  between  different  agents  so 
that  they  can  communicate  neural  signals  among  each  other  through  wireless  connections.  The  interesting  thing  about 
such  communication  is  that  it  is  at  the  level  of  neural  signals  rather  than  through  any  particular  language.  Preliminary 
results  were  promising.  However,  the  hive  brain  is  actually  a  highly  ambitious  and  complex  endeavor,  encompassing 
novel  neural  configurations,  wireless  communications,  and  multiagent  coordination;  it  also  opens  up  new  domains. 
Thus  instead  of  turning  it  into  a  publication,  we  made  it  part  of  our  Phase  III  proposal  to  ARO,  which  was  accepted 
(as  was  the  DARPA  portion,  which  focuses  on  simulation  to  real-world  transfer).  Thus  the  hive  brain  has  grown  to 
become  a  major  initiative  in  Phase  III. 

It  is  also  important  to  note  that  during  this  time  we  also  began  to  work  with  ARL  to  show  that  our  trained  ANN 
controller  can  indeed  work  in  their  Packbots.  While  this  particular  exercise  in  real-world  transfer  was  not  specifically 
articulated  in  the  original  proposal,  it  became  apparent  as  we  approached  the  Phase  III  proposal  that  ARL  needed 
to  see  that  such  transfer  would  work  before  endorsing  further  collaboration  (which  would  be  a  goal  for  Phase  III). 
Thus  during  this  period  we  also  built  a  Packbot  simulator  because  ARL’s  current  experimental  UGV  is  the  Packbot. 
We  then  ran  a  successful  test  in  which  we  evolved  a  hall  navigation  controller  for  a  Packbot  with  HyperNEAT  in  our 
simulator  and  transferred  it  to  a  real  Packbot  at  ARL,  which  proceeded  to  navigate  a  hallway  at  the  ARL  location. 
This  demonstration  confirmed  that  our  infrastructure  produces  controllers  that  can  work  in  the  real  world,  including  at 
ARL.  This  work  was  performed  in  collaboration  with  Stuart  Young  and  Dave  Baran  at  ARL. 

In  summary,  two  of  the  three  extensions  were  completed  and  the  other  grew  to  become  a  basis  of  the  Phase  III 
project.  Furthermore,  we  successfully  transferred  a  controller  trained  in  our  simulator  to  a  real  Packbot  at  ARL. 


2.3  Milestone  3:  Applications  and  Major  Scaling 

The  focus  of  this  milestone  was  on  scaling  and  DoD-related  applications.  One  of  the  key  motivations  for  multiagent 
HyperNEAT  was  its  scalability.  Because  an  entire  team  could  be  described  through  a  single  compact  encoding,  it 
should  be  possible  to  train  very  large  teams.  Put  another  way,  multiagent  HyperNEAT  actually  learns  a  mapping 
between  position  on  a  team  and  ANN  policy.  Thus  in  principle  it  should  be  possible  to  sample  as  many  positions 
as  desired  within  this  mapping,  e.g.  hundreds  of  them,  and  still  obtain  functional  teams.  Two  kinds  of  scalability 
are  relevant:  One  is  pre-training  scaling,  which  means  that  the  size  of  the  team  that  is  trained  can  be  be  very  large. 
This  kind  of  scaling  is  important  because  most  traditional  multiagent  training  techniques  struggle  with  training  very 
large  teams  (Conitzer  and  Sandholm  2007;  Littman  1994;  Singh  et  al.  2000;  Stone  and  Sutton  2001).  The  other  kind 
of  scalability  is  post-training  scaling,  which  means  that  new  roles  can  be  interpolated  for  agents  at  positions  that 
were  not  initially  trained.  Multiagent  HyperNEAT  can  perform  such  post-training  interpolations  based  on  the  policy 
geometry  it  learned  during  training.  While  role  interpolation  is  a  heuristic,  in  some  tasks  it  may  be  a  useful  way  to  add 
new  agents  to  a  team  on  the  fly.  The  goal  was  to  train  and  scale  teams  up  to  a  size  of  1,000  agents. 

Both  kinds  of  scaling  were  tested  extensively  over  the  course  of  the  project.  Successful  results  from  post-training 
scaling  are  published  in  D’Ambrosio  et  al.  (2010)  in  two  domains.  Teams  are  scaled  to  sizes  of  up  to  1,000  agents. 
Videos  from  these  experiments  are  at:  http :  / / eplex .  cs  .  ucf  .  edu/mahnaamas20 10  .  html 

However,  a  larger  and  more  extensive  study  of  scaling  with  multiagent  HyperNEAT  in  a  version  of  the  multiagent 
predator-prey  domain  is  in  our  current  journal  submission  to  JAIR.  In  this  paper  currently  under  review,  multiagent 
HyperNEAT  is  compared  to  multiagent  SARSA  in  both  pre-training  and  post-training  scaling.  The  paper  emphasizes 
the  significant  advantage  for  multiagent  HyperNEAT  in  pre-training  scaling  as  team  size  grows  larger.  SARSA  does 
not  have  a  mechanism  for  role  interpolation  so  it  naturally  does  not  do  as  well  at  post-training  scaling,  but  it  also 
lags  multiagent  HyperNEAT  in  pre-training  scaling.  Although  these  results  are  currently  under  review,  videos  of  the 
comparisons  can  be  seen  at: 
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http : // eplex . cs . ucf . edu/ comparison . html 

One  interesting  issue  that  proved  often  controversial  with  reviewers  is  the  generality  of  post-training  scaling. 
Although  many  teams  trained  over  the  course  of  our  project  did  scale  well  post-training,  reviewers  often  point  out 
that  in  some  domains  roles  may  not  be  possible  to  interpolate  post-training  because  of  complex  nonlinearities  in  the 
way  different  roles  cooperate  with  each  other.  This  point  is  important  because  it  highlights  that  while  post-training 
scaling  is  an  unusual  capability  that  is  useful  in  particular  domains,  the  more  general  practical  benefit  across  many 
domains  may  be  in  pre-training  scaling,  i.e.  the  ability  to  train  medium  or  large  teams  without  confronting  the  curse 
of  dimensionality. 

During  this  period  we  also  began  to  investigate  the  new  application  domain  of  patrol  and  return  (i.e.  it  is  a  new 
domain  beyond  the  original  room-clearing  domain)  in  which  a  team  of  robots  fans  out  in  a  building  and  then  individual 
robots  return  to  the  entrance  when  called  back  (e.g.  to  have  their  batteries  recharged).  This  domain  was  ultimately 
demonstrated  successfully  in  the  previously-mentioned  IROS  paper  that  also  examined  situational  policy  geometry 
(D’Ambrosio  et  al.  2011).  Interestingly,  the  learned  policies  generalized  to  different  building  maps  as  well.  As  noted 
above,  videos  of  robots  in  this  domain  are  at:  http  :  //eplex .  cs  .  ucf  .  edu/patrolling .  html 

Overall  then  both  significant  scaling  of  various  types  and  training  in  a  new  domain  succeeded. 


2.4  Milestone  4:  Room  Clearing  with  Real  Robots 

The  original  proposal  was  to  culminate  with  a  room  clearing  experiment  with  real  Khepera  robots.  The  goal  is  that  the 
Khepera  III  robots  learn  to  enter  a  room  and  spread  out  along  its  perimeter  on  their  own.  The  procedure  is  to  train  the 
team  in  our  custom-designed  simulator  (created  for  Phase  II)  with  only  five  agents  and  then  to  transfer  it  to  a  version  of 
the  domain  in  the  real  world.  Then  the  team  is  scaled  to  seven  agent  after  training  (i.e.  post-training  scaling),  forcing 
HyperNEAT  to  interpolate  roles  in  the  room-clearing  team  for  the  additional  two  agents.  This  interpolated  scaling  is 
first  tested  in  the  simulator  and  then  in  the  real  world. 

We  were  able  to  achieve  this  entire  sequence  successfully.  A  video  documenting  the  team  both  in  the  simulator 

and  in  the  real  world  is  at:  http  :  / / www .  youtube  .  com/ watch?v=2VaDtU5XVC8 

Note  that  the  patrol  and  return  domain  was  also  (in  addition  to  room  clearing)  validated  in  the  real  world,  as  shown 
(noted  again)  at: 

http : // eplex . cs . ucf . edu/patrolling . html 

Both  of  these  real-world  tests  took  significant  engineering,  testing,  and  design.  We  learned  a  great  deal  about  what 
it  takes  to  move  controllers  out  of  simulation  and  into  the  real  world,  knowledge  that  will  serve  us  well  in  Phase  III. 

Thus  we  were  able  to  demonstrate  room  clearing,  patrol  and  return,  and  scaling  up  team  size  without  further 
training,  all  trained  in  simulation  and  transferred  to  real-world  robots.  These  accomplishments  show  that  multiagent 
HyperNEAT  indeed  works  in  the  real  world.  Furthermore,  both  these  tasks  are  relevant  to  real  DoD-related  domains, 
which  in  principle  thus  could  be  tackled  by  military  robots  such  as  Packbots.  Our  additional  successful  transfer  of  a 
hallway  navigation  ANN  to  a  real  Packbot  at  ARL  further  supports  that  the  work  completed  here  creates  significant  po¬ 
tential  for  multiagent  learning  in  DoD-related  applications.  This  enterprise  now  continues  with  Phase  III  of  the  CSSG, 
which  includes  a  grant  from  ARO  (Grant  No.  W91  INF-1 1-1-0489)  and  the  DARPA  match  (Grant  No.  N1 1AP20003). 


2.5  Additional  Accomplishments 

Several  other  significant  technologies  were  developed  and  expanded  in  support  of  the  Phase  II  work.  This  section 
documents  this  supporting  work. 

First,  the  novelty  search  method,  which  was  introduced  by  my  group  shortly  before  Phase  II  started  (Fehman  and 
Stanley  2008,  201  la),  has  proven  an  effective  alternative  to  traditional  objective-based  search  in  some  domains.  It  also 
was  useful  in  many  of  our  experiments  with  multiagent  HyperNEAT  as  an  alternative  means  to  exploring  the  behavior 
space  in  various  domains.  Over  the  course  of  Phase  II,  a  number  of  enhancements  and  explorations  were  made:  an 
extension  called  minimal  criteria  novelty  search  was  introduced  (Fehman  and  Stanley  2010b),  novelty  search  was 
proven  in  genetic  programming  (Fehman  and  Stanley  2010a),  it  was  shown  to  help  in  evolving  adaptive  ANNs  (Risi 
et  al.  2010a,  2009)  (the  2009  paper  won  another  Best  Paper  Award),  its  creativity  was  demonstrated  through  evolving 
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virtual  creatures  (Lehman  and  Stanley  2011b),  and  it  was  shown  to  yield  more  evolvable  genomes  (Lehman  and 
Stanley  2011c).  Furthermore,  we  completed  a  journal  article  on  novelty  search  (Lehman  and  Stanley  2011a)  (started 
before  Phase  II)  and  shared  it  with  the  genetic  programming  community  (Lehman  and  Stanley  201  Id).  This  approach 
is  a  significant  new  area  for  evolutionary  computation  in  its  own  right  and  magnifies  the  impact  of  the  Phase  II  work. 

Second,  we  implemented  several  enhancements  to  the  underlying  HyperNEAT  algorithm  that  can  also  apply  to 
multiagent  HyperNEAT.  HyperNEAT  is  a  novel  approach  to  evolving  ANNs  that  opened  many  new  directions  for 
investigation  in  its  own  right,  some  of  which  were  exploited  in  this  supporting  work:  HyperNEAT  was  extended  to 
evolve  plastic  ANNs,  i.e.  ANNs  whose  synapses  change  over  their  lifetime,  and  HyperNEAT  was  also  extended  to 
decide  the  placement  of  density  of  hidden  neurons  (which  it  could  not  do  before)  in  a  new  version  called  evolvable 
substrate  HyperNEAT  (ES -HyperNEAT)  (Risi  et  al.  2010b;  Risi  and  Stanley  2011)  (the  2010  publication  won  another 
Best  Paper  Award).  Both  these  extensions  are  built  into  our  multiagent  simulator  (built  for  Phase  II)  and  can  be  run 
with  multiagent  HyperNEAT.  The  simulator  is  freely  available  at: 
http : // eplex . cs . ucf . edu/ software . html 

These  enhancements  and  extensions  are  important  to  the  progress  of  the  field  of  neuroevolution  and  HyperNEAT 
in  general,  and  are  now  available  to  contribute  to  Phase  III. 


2.6  Summary 

Overall,  almost  every  milestone  was  fulfilled  in  its  entirety.  The  only  task  still  ongoing  beyond  Phase  II  is  the  hive 
brain,  but  it  ultimately  formed  the  basis  for  the  new  proposal  to  ARO  (now  funded),  which  means  it  was  a  successful 
stepping  stone  to  further  advancement  as  well.  Furthermore,  many  extensions  and  enhancements  were  also  completed 
and  published  that  went  beyond  what  was  initially  proposed. 


3  Reasons  why  established  goals  were  not  met,  if  appropriate. 

No  significant  goals  were  unmet. 


4  Other  Pertinent  Information 

The  research  completed  in  Phase  II  now  has  set  the  stage  for  Phase  III.  Thus  this  research  stream  is  in  effect  still 
ongoing.  Phase  III  includes  funding  from  DARPA  and  ARO.  The  goals  are  to  enhance  simulation  to  real-world  transfer 
and  to  build  upon  the  hive  brain  concept  to  enable  new  DoD-relevant  applications,  which  will  include  collaboration 
with  ARL. 
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